387 research outputs found
The psychological reality of rhythm classes: Perceptual studies
Linguists have traditionally classified languages into three
rhythm classes, namely stress-timed, syllable-timed and mora-timed languages. However, this classification has remained controversial for various reasons: the search for reliable acoustic cues to the different rhythm types has long remained elusive; some languages are claimed to belong to none of the three classes; and few perceptual studies has bolstered the notion. We have previously proposed an acoustic/phonetic model of the different types of linguistic rhythm, and of their categorisation as such by
listeners. Here, we present perceptual experiments that directly test the notion of rhythm classes, our model's predictions, and the question of intermediate languages. Language discrimination experiments were run using a speech resynthesis technique to ensure that only rhythmic cues were available to the subjects. Languages investigated were English, Dutch, Spanish, Catalan and Polish. Our results are consistent with the idea that English and Dutch are stress-timed, Spanish and Catalan are syllable-timed,
but Polish seems to be different from any other language studied and thus may constitute a new rhythm class. We propose that perceptual studies tapping the ability to discriminate languages' rhythm are the proper way to generate more empirical data relevant to rhythm typology
A Temporal Coherence Loss Function for Learning Unsupervised Acoustic Embeddings
AbstractWe train neural networks of varying depth with a loss function which imposes the output representations to have a temporal profile which looks like that of phonemes. We show that a simple loss function which maximizes the dissimilarity between near frames and long distance frames helps to construct a speech embedding that improves phoneme discriminability, both within and across speakers, even though the loss function only uses within speaker information. However, with too deep an architecture, this loss function yields overfitting, suggesting the need for more data and/or regularization
Learning weakly supervised multimodal phoneme embeddings
Recent works have explored deep architectures for learning multimodal speech
representation (e.g. audio and images, articulation and audio) in a supervised
way. Here we investigate the role of combining different speech modalities,
i.e. audio and visual information representing the lips movements, in a weakly
supervised way using Siamese networks and lexical same-different side
information. In particular, we ask whether one modality can benefit from the
other to provide a richer representation for phone recognition in a weakly
supervised setting. We introduce mono-task and multi-task methods for merging
speech and visual modalities for phone recognition. The mono-task learning
consists in applying a Siamese network on the concatenation of the two
modalities, while the multi-task learning receives several different
combinations of modalities at train time. We show that multi-task learning
enhances discriminability for visual and multimodal inputs while minimally
impacting auditory inputs. Furthermore, we present a qualitative analysis of
the obtained phone embeddings, and show that cross-modal visual input can
improve the discriminability of phonological features which are visually
discernable (rounding, open/close, labial place of articulation), resulting in
representations that are closer to abstract linguistic features than those
based on audio only
Occlusion resistant learning of intuitive physics from videos
To reach human performance on complex tasks, a key ability for artificial
systems is to understand physical interactions between objects, and predict
future outcomes of a situation. This ability, often referred to as intuitive
physics, has recently received attention and several methods were proposed to
learn these physical rules from video sequences. Yet, most of these methods are
restricted to the case where no, or only limited, occlusions occur. In this
work we propose a probabilistic formulation of learning intuitive physics in 3D
scenes with significant inter-object occlusions. In our formulation, object
positions are modeled as latent variables enabling the reconstruction of the
scene. We then propose a series of approximations that make this problem
tractable. Object proposals are linked across frames using a combination of a
recurrent interaction network, modeling the physics in object space, and a
compositional renderer, modeling the way in which objects project onto pixel
space. We demonstrate significant improvements over state-of-the-art in the
intuitive physics benchmark of IntPhys. We apply our method to a second dataset
with increasing levels of occlusions, showing it realistically predicts
segmentation masks up to 30 frames in the future. Finally, we also show results
on predicting motion of objects in real videos
Developmental Psychology: A Precursor of Moral Judgment in Human Infants?
Human infants evaluate social interactions well before they can speak, and show a preference for characters that help others over characters that are not cooperative or are hindering
Phoneme learning is influenced by the taxonomic organization of the semantic referents
International audienceWord learning relies on the ability to master the sound contrasts that are phonemic (i.e., signal meaning difference) in a given language. Though the timeline of phoneme development has been studied extensively over the past few decades, the mechanism of this development is poorly understood. Previous work has shown that human learners rely on referential information to differentiate similar sounds, but largely ignored the problem of taxonomic ambiguity at the semantic level (two different objects may be described by one or two words depending on how abstract the meaning intended by the speaker is). In this study, we varied the taxonomic distance of pairs of objects and tested how adult learners judged the phonemic status of the sound contrast associated with each of these pairs. We found that judgments were sensitive to gradients in the taxonomic structure, suggesting that learners use probabilistic information at the semantic level to optimize the accuracy of their judgements at the phonological level. The findings provide evidence for an interaction between phonological learning and meaning generalization, raising important questions about how these two important processes of language acquisition are related
Epenthetic vowels in Japanese: A perceptual illusion?
In four cross-linguistic experiments comparing French and Japanese hearers, we found that the phonotactic properties of Japanese (very reduced set of syllable types) induce Japanese listeners to perceive ``illusory'' vowels inside consonant clusters in VCCV stimuli. In Experiments 1 and 2, we used a continuum of stimuli ranging from no vowel (e.g. ebzo) to a full vowel between the consonants (e.g. ebuzo). Japanese, but not French participants, reported the presence of a vowel [u] between consonants, even in stimuli with no vowel. A speeded ABX discrimination paradigm was used in Experiments 3 and 4, and revealed that Japanese participants had trouble discriminating between VCCV and VCuCV stimuli. French participants, in contrast had problems discriminating items that differ in vowel length (ebuzo vs. ebuuzo), a distinctive contrast in Japanese but not in French. We conclude that models of speech perception have to be revised to account for phonotactically-based assimilations
On Doing Things Intentionally
Recent empirical and conceptual research has shown that moral considerations have an influence on the way we use the adverb 'intentionally'. Here we propose our own account of these phenomena, according to which they arise from the fact that the adverb 'intentionally' has three different meanings that are differently selected by contextual factors, including normative expectations. We argue that our hypotheses can account for most available data and present some new results that support this. We end by discussing the implications of our account for folk psychology
Le langage et son acquisition : bases biologiques et psychologiques
Emmanuel Dupoux, directeur d’études 1. Bases psychologiques des jugements moraux L’étude des bases psychologiques et neurobiologiques du jugement moral a pris un nouveau départ depuis les cinq dernières années. Ce renouveau d’intérêt est du à la confluence de trois courants de recherche : 1) l’étude des émotions, 2) les neurosciences de la cognition sociale, 3) l’analogie entre intuitions grammaticales et intuitions morales. Concernant le premier courant, Haidt (2001) a découvert que certains..
- …